64 research outputs found

    Trajectory-Aware Body Interaction Transformer for Multi-Person Pose Forecasting

    Full text link
    Multi-person pose forecasting remains a challenging problem, especially in modeling fine-grained human body interaction in complex crowd scenarios. Existing methods typically represent the whole pose sequence as a temporal series, yet overlook interactive influences among people based on skeletal body parts. In this paper, we propose a novel Trajectory-Aware Body Interaction Transformer (TBIFormer) for multi-person pose forecasting via effectively modeling body part interactions. Specifically, we construct a Temporal Body Partition Module that transforms all the pose sequences into a Multi-Person Body-Part sequence to retain spatial and temporal information based on body semantics. Then, we devise a Social Body Interaction Self-Attention (SBI-MSA) module, utilizing the transformed sequence to learn body part dynamics for inter- and intra-individual interactions. Furthermore, different from prior Euclidean distance-based spatial encodings, we present a novel and efficient Trajectory-Aware Relative Position Encoding for SBI-MSA to offer discriminative spatial information and additional interactive clues. On both short- and long-term horizons, we empirically evaluate our framework on CMU-Mocap, MuPoTS-3D as well as synthesized datasets (6 ~ 10 persons), and demonstrate that our method greatly outperforms the state-of-the-art methods. Code will be made publicly available upon acceptance.Comment: Accepted by CVPR2023, 8 pages, 6 figures. arXiv admin note: text overlap with arXiv:2208.0922

    PCCNet:A Few-Shot Patch-wise Contrastive Colorization Network

    Get PDF
    Few-shot colorization aims to learn a model to colorize images with little training data. Yet, existing models often fail to keep color consistency due to ignored patch correlations of the images. In this paper, we propose PCCNet, a novel Patch-wise Contrastive Colorization Network to learn color synthesis by measuring the similarities and variations of image patches in two different aspects: inter-image and intra-image. Specifically, for inter-image, we investigate a patch-wise contrastive learning mechanism with positive and negative samples constraint to distinguish color features between patches across images. For intra-image, we explore a new intra-image correlation loss function to measure the similarity distribution which reveals structural relations between patches within an image. Furthermore, we propose a novel color memory loss that improves the accuracy of the memory module to store and retrieve data. Experiments show that our method allows the correctly saturated color to spread naturally over objects and also achieves higher scores in quantitative comparisons with related methods

    USTNet:Unsupervised Shape-to-Shape Translation via Disentangled Representations

    Get PDF
    We propose USTNet, a novel deep learning approach designed for learning shape-to-shape translation from unpaired domains in an unsupervised manner. The core of our approach lies in disentangled representation learning that factors out the discriminative features of 3D shapes into content and style codes. Given input shapes from multiple domains, USTNet disentangles their representation into style codes that contain distinctive traits across domains and content codes that contain domaininvariant traits. By fusing the style and content codes of the target and source shapes, our method enables us to synthesize new shapes that resemble the target style and retain the content features of source shapes. Based on the shared style space, our method facilitates shape interpolation by manipulating the style attributes from different domains. Furthermore, by extending the basic building blocks of our network from two-class to multi-class classification, we adapt USTNet to tackle multi-domain shape-to-shape translation. Experimental results show that our approach can generate realistic and natural translated shapes and that our method leads to improved quantitative evaluation metric results compared to 3DSNet. Codes are available at https://Haoran226.github.io/USTNet

    Co-skeletons:Consistent curve skeletons for shape families

    Get PDF
    We present co-skeletons, a new method that computes consistent curve skeletons for 3D shapes from a given family. We compute co-skeletons in terms of sampling density and semantic relevance, while preserving the desired characteristics of traditional, per-shape curve skeletonization approaches. We take the curve skeletons extracted by traditional approaches for all shapes from a family as input, and compute semantic correlation information of individual skeleton branches to guide an edge-pruning process via skeleton-based descriptors, clustering, and a voting algorithm. Our approach achieves more concise and family-consistent skeletons when compared to traditional per-shape methods. We show the utility of our method by using co-skeletons for shape segmentation and shape blending on real-world data

    PCCNet:A Few-Shot Patch-wise Contrastive Colorization Network

    Get PDF
    Few-shot colorization aims to learn a model to colorize images with little training data. Yet, existing models often fail to keep color consistency due to ignored patch correlations of the images. In this paper, we propose PCCNet, a novel Patch-wise Contrastive Colorization Network to learn color synthesis by measuring the similarities and variations of image patches in two different aspects: inter-image and intra-image. Specifically, for inter-image, we investigate a patch-wise contrastive learning mechanism with positive and negative samples constraint to distinguish color features between patches across images. For intra-image, we explore a new intra-image correlation loss function to measure the similarity distribution which reveals structural relations between patches within an image. Furthermore, we propose a novel color memory loss that improves the accuracy of the memory module to store and retrieve data. Experiments show that our method allows the correctly saturated color to spread naturally over objects and also achieves higher scores in quantitative comparisons with related methods

    PCCNet:A Few-Shot Patch-wise Contrastive Colorization Network

    Get PDF
    Few-shot colorization aims to learn a model to colorize images with little training data. Yet, existing models often fail to keep color consistency due to ignored patch correlations of the images. In this paper, we propose PCCNet, a novel Patch-wise Contrastive Colorization Network to learn color synthesis by measuring the similarities and variations of image patches in two different aspects: inter-image and intra-image. Specifically, for inter-image, we investigate a patch-wise contrastive learning mechanism with positive and negative samples constraint to distinguish color features between patches across images. For intra-image, we explore a new intra-image correlation loss function to measure the similarity distribution which reveals structural relations between patches within an image. Furthermore, we propose a novel color memory loss that improves the accuracy of the memory module to store and retrieve data. Experiments show that our method allows the correctly saturated color to spread naturally over objects and also achieves higher scores in quantitative comparisons with related methods

    Learning Weakly Supervised Audio-Visual Violence Detection in Hyperbolic Space

    Full text link
    In recent years, the task of weakly supervised audio-visual violence detection has gained considerable attention. The goal of this task is to identify violent segments within multimodal data based on video-level labels. Despite advances in this field, traditional Euclidean neural networks, which have been used in prior research, encounter difficulties in capturing highly discriminative representations due to limitations of the feature space. To overcome this, we propose HyperVD, a novel framework that learns snippet embeddings in hyperbolic space to improve model discrimination. Our framework comprises a detour fusion module for multimodal fusion, effectively alleviating modality inconsistency between audio and visual signals. Additionally, we contribute two branches of fully hyperbolic graph convolutional networks that excavate feature similarities and temporal relationships among snippets in hyperbolic space. By learning snippet representations in this space, the framework effectively learns semantic discrepancies between violent and normal events. Extensive experiments on the XD-Violence benchmark demonstrate that our method outperforms state-of-the-art methods by a sizable margin.Comment: 8 pages, 5 figure
    • …
    corecore